sufficient condition
Composing Global Solutions to Reasoning Tasks via Algebraic Objects in Neural Nets
We prove rich algebraic structures of the solution space for 2-layer neural networks with quadratic activation and L2 loss, trained on reasoning tasks in Abelian group (e.g., modular addition). Such a rich structure enables analytical construction of global optimal solutions from partial solutions that only satisfy part of the loss, despite its high nonlinearity.
AUnifying View of Linear Function Approximation in Off-Policy Reinforcement Learning through Matrix Splitting and Preconditioning
In off-policy policy evaluation (OPE) tasks within reinforcement learning, Temporal Difference Learning(TD) and Fitted Q-Iteration (FQI) have traditionally been viewed as differing in the number of updates toward the target value function: TD makes one update, FQI makes an infinite number, and Partial Fitted Q-Iteration (PFQI) performs a finite number. We show that this view is not accurate, and provide a new mathematical perspective under linear value function approximation that unifies these methods as a single iterative method solving the same linear system, but using different matrix splitting schemes and preconditioners. We show that increasing the number of updates under the same target value function, i.e., the target network technique, is a transition from using a constant preconditioner to using a data-feature adaptive preconditioner. This elucidates, for the first time, why TD convergence does not necessarily imply FQI convergence, and establishes tight convergence connections among TD, PFQI, and FQI. Our framework enables sharper theoretical results than previous work and characterization of the convergence conditions for each algorithm, without relying on assumptions about the features (e.g., linear independence). We also provide an encoder-decoder perspective to better understand the convergence conditions of TD, and prove, for the first time, that when a large learning rate doesn't work, trying a smaller one may help. Our framework also leads to the discovery of new crucial conditions on features for convergence, and shows how common assumptions about features influence convergence, e.g., the assumption of linearly independent features can be dropped without compromising the convergence guarantees of stochastic TD in the on-policy setting. This paper is also the first to introduce matrix splitting into the convergence analysis of these algorithms.
Price of Quality: Sufficient Conditions for Sparse Recovery using Mixed-Quality Data
Chaabouni, Youssef, Gamarnik, David
We study sparse recovery when observations come from mixed-quality sources: a small collection of high-quality measurements with small noise variance and a larger collection of lower-quality measurements with higher variance. For this heterogeneous-noise setting, we establish sample-size conditions for information-theoretic and algorithmic recovery. On the information-theoretic side, we show that it is sufficient for $(n_1, n_2)$ to satisfy a linear trade-off defining the Price of Quality: the number of low-quality samples needed to replace one high-quality sample. In the agnostic setting, where the decoder is completely agnostic to the quality of the data, it is uniformly bounded, and in particular one high-quality sample is never worth more than two low-quality samples for this sufficient condition to hold. In the informed setting, where the decoder is informed of per-sample variances, the price of quality can grow arbitrarily large. On the algorithmic side, we analyze the LASSO in the agnostic setting and show that the recovery threshold matches the homogeneous-noise case and only depends on the average noise level, revealing a striking robustness of computational recovery to data heterogeneity. Together, these results give the first conditions for sparse recovery with mixed-quality data and expose a fundamental difference between how the information-theoretic and algorithmic thresholds adapt to changes in data quality.
The optimal betting wealth growth rate
This paper characterizes the best possible rate of growth of wealth in a Kelly betting game when repeatedly betting against a general i.i.d. null hypothesis $\mathscr{P}$, but the data are drawn i.i.d from an arbitrary alternative $Q$. We prove that it equals $\lim_{n \to \infty}n^{-1}\inf_{P \in (\mathscr P)^n)^{\circ\circ}} \mathrm{KL}(Q^n,P)$, where ${\mathscr P}^n = \{P^n: P \in \mathscr{P}\}$ and $(\mathscr {P}^n)^{\circ\circ}$ is its bipolar, i.e., this rate is achievable and one cannot do better. This quantity is in general smaller than a more popular quantity in the literature, $\mathrm{KL}_{\inf}(Q,\mathscr{P}) := \inf_{P \in \mathscr P}\mathrm{KL}(Q,P)$. If $\mathrm{KL}_{\mathrm{inf}}(\cdot,\mathscr P)$ is weakly lowersemicontinuous (w.l.s.c.) at $Q$, we show that the two quantities are equal; in particular, this happens when $\mathscr P$ is weakly compact. For simple alternatives, we provide the first matching necessary and sufficient condition for when power-one sequential tests exist (without assumptions on $\mathscr P, Q$). We also derive the optimal worst-case growth rate against composite $\mathscr Q$. We emphasize that test supermartingales on reduced filtrations suffice for all i.i.d. testing problems, and more general e-processes are not required. We thus completely generalize the recent results of Larsson et al.~\cite{larsson2025numeraire} to the sequential setting.
Pareto-Optimal Learning-Augmented Algorithms for Online Conversion Problems
This paper leverages machine-learned predictions to design competitive algorithms for online conversion problems with the goal of improving the competitive ratio when predictions are accurate (i.e., consistency), while also guaranteeing a worstcase competitive ratio regardless of the prediction quality (i.e., robustness). We unify the algorithmic design of both integral and fractional conversion problems, which are also known as the 1-max-search and one-way trading problems, into a class of online threshold-based algorithms (OTA). By incorporating predictions into design of OTA, we achieve the Pareto-optimal trade-off of consistency and robustness, i.e., no online algorithm can achieve a better consistency guarantee given for a robustness guarantee. We demonstrate the performance of OTA using numerical experiments on Bitcoin conversion.
Results
For any > 0, the -covering number of the Euclidean ball Bd(R):= {x 2Rd: kxk2 R} with radius R> 0 in the Euclidean metric is upper bounded by (1+2R/)d. Let F0 F 1 ... FT be a filtration and let X1,X2,...,XT be real random variables such that Xt is Ft-measurable, E[Xt|Ft 1]=0, |Xt| balmost surely, and PT t=1 E[X2t |Ft 1] V for some fixed V> 0and b> 0. Then for any 2(0,1), we have with probability at least 1, For any linear MDP satisfying Definition 3.1, we must have that k (s,a)k2 1/ p d for all s and a, and k,hk2 1/ p d for all and h. By Definition 3.1, we know that Ph( |s,a)= h (s,a),µh()i forms a valid probability distribution, and that k R S |dµh(s)|k2 p d. This yields the first equality. Repeating this calculation h 1more times yields the final equality. Lemma A.8. Fix some h and i